Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available July 14, 2026
- 
            Free, publicly-accessible full text available July 14, 2026
- 
            Existing multimodal-based human action recognition approaches are computationally intensive, limiting their deployment in real-time applications. In this work, we present a novel and efficient pose-driven attention-guided multimodal network (EPAM-Net) for action recognition in videos. Specifically, we propose eXpand temporal Shift (X-ShiftNet) convolutional architectures for RGB and pose streams to capture spatio-temporal features from RGB videos and their skeleton sequences. The X-ShiftNet tackles the high computational cost of the 3D CNNs by integrating the Temporal Shift Module (TSM) into an efficient 2D CNN, enabling efficient spatiotemporal learning. Then skeleton features are utilized to guide the visual network stream, focusing on keyframes and their salient spatial regions using the proposed spatial–temporal attention block. Finally, the predictions of the two streams are fused for final classification. The experimental results show that our method, with a significant reduction in floating-point operations (FLOPs), outperforms and competes with the state-of-the-art methods on NTU RGB-D 60, NTU RGB-D 120, PKU-MMD, and Toyota SmartHome datasets. The proposed EPAM-Net provides up to a 72.8x reduction in FLOPs and up to a 48.6x reduction in the number of network parameters. The code will be available at https://github.com/ahmed-nady/Multimodal-ActionRecognition.more » « lessFree, publicly-accessible full text available February 25, 2026
- 
            Abstract Purpose This article introduces a novel deep learning approach to substantially improve the accuracy of colon segmentation even with limited data annotation, which enhances the overall effectiveness of the CT colonography pipeline in clinical settings. Methods The proposed approach integrates 3D contextual information via guided sequential episodic training in which a query CT slice is segmented by exploiting its previous labeled CT slice (i.e., support). Segmentation starts by detecting the rectum using a Markov Random Field-based algorithm. Then, supervised sequential episodic training is applied to the remaining slices, while contrastive learning is employed to enhance feature discriminability, thereby improving segmentation accuracy. Results The proposed method, evaluated on 98 abdominal scans of prepped patients, achieved a Dice coefficient of 97.3% and a polyp information preservation accuracy of 98.28%. Statistical analysis, including 95% confidence intervals, underscores the method’s robustness and reliability. Clinically, this high level of accuracy is vital for ensuring the preservation of critical polyp details, which are essential for accurate automatic diagnostic evaluation. The proposed method performs reliably in scenarios with limited annotated data. This is demonstrated by achieving a Dice coefficient of 97.15% when the model was trained on a smaller number of annotated CT scans (e.g., 10 scans) than the testing dataset (e.g., 88 scans). Conclusions The proposed sequential segmentation approach achieves promising results in colon segmentation. A key strength of the method is its ability to generalize effectively, even with limited annotated datasets—a common challenge in medical imaging.more » « lessFree, publicly-accessible full text available February 1, 2026
- 
            Abstract. Accurate colon segmentation on abdominal CT scans is crucial for various clinical applications. In this work, we propose an accurate AQ1 approach to colon segmentation from abdomen CT scans. Our architecture incorporates 3D contextual information via sequential episodic training (SET). In each episode, we used two consecutive slices, in a CT scan, as support and query samples in addition to other slices that did not include colon regions as negative samples. Choosing consecutive slices is a proper assumption for support and query samples, as the anatomy of the body does not have abrupt changes. Unlike traditional few-shot segmentation (FSS) approaches, we use the episodic training strategy in a supervised manner. In addition, to improve the discriminability of the learned features of the model, an embedding space is developed using contrastive learning. To guide the contrastive learning process, we use AQ2 an initial labeling that is generated by a Markov random field (MRF)- based approach. Finally, in the inference phase, we first detect the rec tum, which can be accurately extracted using the MRF-based approach, and then apply the SET on the remaining slices. Experiments on our private dataset of 98 CT scans and a public dataset of 30 CT scans illustrate that the proposed FSS model achieves a remarkable validation dice coefficient (DC) of 97.3% (Jaccard index, JD 94. 5%) compared to the classical FSS approaches 82.1% (JD 70.3%). Our findings highlight the efficacy of sequential episodic training in accurate 3D medical imaging segmentation. The codes for the proposed models are available at https://github.com/Samir-Farag/ICPR2024.more » « lessFree, publicly-accessible full text available December 2, 2025
- 
            In this work, we propose a novel method for assessing students’ behavioral engagement by representing student’s actions and their frequencies over an arbitrary time interval as a histogram of actions. This histogram and the student’s gaze are utilized as input to a classifier that determines whether the student is engaged or not. For action recognition, we use students’ skeletons to model their postures and upper body movements. To learn the dynamics of a student’s upper body, a 3D-CNN model is developed. The trained 3D-CNN model recognizes actions within every 2-minute video segment then these actions are used to build the histogram of actions. To evaluate the proposed framework, we build a dataset consisting of 1414 video segments annotated with 13 actions and 963 2-minute video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top-1 accuracy 86.32% and the proposed framework can capture the average engagement of the class with a 90% F1-score.more » « lessFree, publicly-accessible full text available November 13, 2025
- 
            Early diagnosis of colorectal polyps, before they turn into cancer, is one of the main keys to treatment. In this work, we propose a framework to help radiologists in reading CT scans and identifying candidate CT slices that have polyps. We propose a colorectal polyps detection approach which consists of two cascaded stages. In the first stage, a CNN-based model is trained and validated to detect polyps in axial CT slices. To narrow down the effective receptive field of the detector neurons, the colon regions are segmented and then fed into the network instead of the original CT slice. This drastically improves the detection and localization results, e.g., the mAP is increased by 36%. To reduce the false positives generated by the detector, in the second stage, we propose a multi-view network (MVN) that classifies polyp candidates. The proposed MVN classifier is trained using sagittal and coronal views corresponding to the detected axial views. The approach is tested in 50 CTC-annotated cases, and the experimental results confirm that after the classification stage, polyps can be detected with an AUC about 95.27%.more » « less
- 
            Among the non-invasive Colorectal cancer (CRC) screening approaches, Computed Tomography Colonography (CTC) and Virtual Colonoscopy (VC), are much more accurate. This work proposes an AI-based polyp detection framework for virtual colonoscopy (VC). Two main steps are addressed in this work: automatic segmentation to isolate the colon region from its background, and automatic polyp detection. Moreover, we evaluate the performance of the proposed framework on low-dose Computed Tomography (CT) scans. We build on our visualization approach, Fly-In (FI), which provides “filet”-like projections of the internal surface of the colon. The performance of the Fly-In approach confirms its ability with helping gastroenterologists, and it holds a great promise for combating CRC. In this work, these 2D projections of FI are fused with the 3D colon representation to generate new synthetic images. The synthetic images are used to train a RetinaNet model to detect polyps. The trained model has a 94% f1-score and 97% sensitivity. Furthermore, we study the effect of dose variation in CT scans on the performance of the the FI approach in polyp visualization. A simulation platform is developed for CTC visualization using FI, for regular CTC and low-dose CTC. This is accomplished using a novel AI restoration algorithm that enhances the Low-Dose CT images so that a 3D colon can be successfully reconstructed and visualized using the FI approach. Three senior board-certified radiologists evaluated the framework for the peak voltages of 30 KV, and the average relative sensitivities of the platform were 92%, whereas the 60 KV peak voltage produced average relative sensitivities of 99.5%.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
